Inherited Feature-based Similarity Measure Based on Large Semantic Hierarchy and Large Text Corpus

نویسندگان

  • Hideki Hirakawa
  • Zhonghui Xu
  • Kenneth B. Haase
چکیده

We describe a similarity calculation model called IFSM (Inherited Feature Similarity Measure) between objects (words/concepts) based on their common and distinctive features. We propose an implementation method for obtaining features based on abstracted triples extracted fi'om a large text eorpus utilizing taxonomical knowledge. This model represents an integration of traditional methods, i.e,. relation b~used sin> itarity measure and distribution based similarity measure. An experiment, using our new concept abstraction method which we <'all the fiat probability grouping method, over 80,000 surface triples, shows that the abstraction level of 3000 is a good basis for feature description.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partial Measure of Semantic Relatedness Based on the Local Feature Selection

A corpus-based Measure of Semantic Relatedness can be calculated for every pair of words occurring in the corpus, but it can produce erroneous results for many word pairs due to accidental associations derived on the basis of several context features.We propose a novel idea of a partial measure that assigns relatedness values only to word pairs well enough supported by corpus data. Three simple...

متن کامل

A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term ...

متن کامل

Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus

Text similarity plays an important role in natural language processing tasks such as answering questions and summarizing text. At present, state-of-the-art text similarity algorithms rely on inefficient word pairings and/or knowledge derived from large corpora such as Wikipedia. This article evaluates previous word similarity measures on benchmark datasets and then uses a hybrid word similarity...

متن کامل

Corpus-based and Knowledge-based Measures of Text Semantic Similarity

This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere,...

متن کامل

The Relevance of Spatial Relation Terms and Geographical Feature Types

Spatial relation terms can generally indicate spatial relations described in natural language context. Their semantic representation is closely related to geographical entities and their characteristics e.g. geometry, scale and geographical feature types. This paper proposes a quantitative approach to explore the semantic relevance of spatial relation terms and geographical feature types of geo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996